Survival Analysis of Colon Cancer Data using Quantile Regression
Vidya Bhargavi M1, Sireesha Veeramachaneni1*, Venkateswara Rao Mudunuru2
1GITAM Institute of Science, GITAM (deemed to be) University, Visakhapatnam, Andhra Pradesh, India.
2Department of Mathematics and Statistics, University of South Florida, Tampa, FL, USA.
*Corresponding Author E-mail: vsirisha80@gmail.com
ABSTRACT:
Quantile regression emerged as an alternative and robust technique to the commonly used regression models. Even in the survival analysis, quantile regression is offering more flexible modelling of survival data without any constraints attached. Unlike traditional Cox hazards models or accelerated failure models, quantile regression does not restrict the variation of the coefficients for different quantiles. In this research we modelled and compared traditional survival regression method with quantile regression applied to colon cancer data.
KEYWORDS: Colon Cancer, Survival Analysis, Quantile Survival Regression, Kaplan-Meier Analysis, Cox Proportional Hazards Function, Parametric Survival Analysis.
INTRODUCTION:
Today, survival analysis is used in almost every scientific field. The word "survival analysis" refers to a method for assessing the likelihood of events such as death or failure following treatment of subjects. Time to event simulation data with censoring is the subject of survival analysis. Censoring is a means of defining data values that do not correspond to a predetermined set of requirements1. In the follow up study, some patients will have time to recur directly. For other patients we know only their time for last check-up or disease-free survival (DFS) because these patients can change physicians, move away, or leave the study for other reasons. These patients are referred to as cases that are censored. In this work, we are interested in survival analysis of colon cancer patients. When abnormal cells are developed in either colon or rectum, it is called colorectal cancer (CRC or colon cancer). According to American Cancer Society (ACS), there will be an estimated 104,610 new colon cancer cases in the year 2020 and majority of them will be adults above 50 years or older. It is estimated that there will be 53,200 deaths due to colorectal cancers in 2020.
In CRC, cancer tumors begin as a noncancerous polyp in the inner lining of the colon or the rectum. These polyps further develop and grow into cancerous tumors and will block the lymph vessels carrying cellular waste. The tumor cells break away and spread to parts of the body distant from there it started. This process is known as metastasis. The extent of metastasis at the time of diagnosis is described as the stage of the cancer2. Staging systems used in literature include TNM (tumor, node, and metastasis) and SEER summary staging. The former staging system is mostly used in clinical setting while the latter is used for statistical analysis.
Survival analysis involves modeling (duration models) time-to-event data with an objective to investigate the effects of covariates on the survival time. In many cases these effects are heterogeneous. Covariates will play a major role effecting the probability of survival at the beginning of the study time and mostly vain off during the later times or even show no effect3.
Many statistical software programs and tools are available to analyze time-to-event data. Using these tools and software, we can model the data using nonparametric Kaplan-Meier (KM) estimations, parametric models such as Weibull, exponential, log-logistic, and semi-parametric approaches using Cox proportional hazards (Cox PH) and quantile regression4. Most of the data analysis in this study is conducted using SAS®. Any test result with a p-value below 5% were considered statistically significant.
If we can assume a strong homogeneous treatment effect, parametric survival models or accelerated failure time (AFT) models are the best approach that provides a direct interpretation of covariate effects on the event time. However, the assumption of homogeneity is close to impossible in time-to-event data analysis. Semi-parametric Cox PH has many advantages over parametric approach. Cox approach models the effect of covariates on the hazard function assuming it to be constant over time. If the PH assumption holds fit, the major problem is interpreting the hazard ratio (HR) estimation. If the PH assumption fails, HR estimation will be misinterpreted. This leads us to quantile survival regression (QSR) approach to provide a dynamic modeling approach. QSR models provide a dynamic, quantile-based relationship between the covariates and the survival time. The interpretation of these models is also straightforward5, 6. In the current study QSR models are developed at various quantiles of survival duration to find the overall survival using the contributing covariates. The performance of these QSR models is compared with parametric, semi-parametric and nonparametric models.
Quantile survival regression (QSR) helps us to measure the importance of covariates in modeling survival time at different quantiles of survival time. Since the distribution of survival times is mostly skewed (right), QSR models are proven to provide more robust estimates for the covariates which are particularly useful for exploring the heterogeneity effects of covariates than the other modeling approaches.
The research on QSR is increasing exponentially. A PubMed search using the keyword “quantile survival regression” returned 200 publications from 2015 to 2021. Compared to parametric, non-parametric and semi-parametric survival models, Cox’s proportional hazard model (semi-parametric) survival model is most often used for survival analysis. However, QSR tampers the proportional hazards assumption of Cox’s. QSR models the outcome variable to the covariates by fully using the data.
MATERIALS AND METHODS:
The colon cancer data for this retrospective cohort study is obtained through the National Cancer Institute's Surveillance, Epidemiology and End Results (SEER) registry for the years 2004-2015. Data was chosen through the National Cancer Institute's Surveillance, Epidemiology and End Results (SEER) registry. The SEER database consists of 13 cancer population registers, covering around 26% of the United States' population7. The patients that survived from colon cancer until the end of the study period were considered as right censored. The accessible data include demographic information of patients (age, gender, race, and marital status), tumor details (grade, size, and histology), and data on nodal stages (number of inspected nodes, number of positive nodes), vitality and survival.
In this work, we preprocessed the SEER information for colon tumors to expel redundancies and missing data. The resulting data set had 30,251 records, which is a combination of four races, Caucasians (88.9%), African Americans (10.4%), American Indians (0.3%) and other races or others (0.4%). Among the 30251 patients included, 49.7% were male and 50.3% female. The average survival among male and female gender is 52 months. The mean age at diagnosis was 67.7 years with a standard deviation 14 years. Most of the patients are white (88.9%), African Americans (10.4%), American Indians (0.3%) and Asian Indians (0.4%). The duration (survival duration) of the study is 143 months. The mean survival duration time is 52.16 months with a standard deviation of 39 months. 56.85% of the subjects are censored and 43.15% are dead.
Univariate analysis for qualitative variables race, gender and marital status variables are not statistically significant and hence are dropped from survival modeling. Only qualitative variable remained in the modeling is histology (in situ, 38.34%; localized, 45.79%; distant, 15.88). The chi-square statistic for the histology is significant at the 0.05 level (p value < 0.0001). This indicates a significant departure from the hypothesized percentages. The stage-wise details, along with the statistics of the variables used in this work are given below in the Table 1. Table 2 provides the research work using quantile survival regression approach and comparison of QSR with traditional survival analysis approaches.
The histogram of the survival duration is given in Figure 1 (Left). Because of the positive skew often seen with survival times, medians act as a better indicator than average of the survival times. From the survival duration histogram graph, it is evident that shorter survival times are more probable, indicating a severe risk for a colon cancer patient to be uncensored and fizzle-out along the time. This is more evident from the cumulative distribution graph Figure 1 (Right). The probability of surviving 30 months or fewer is approximately 25% and probability of surviving 90 months or less is approximately 75%. By 90 months, a colon cancer patient has a good chance of encountering an event, death due to cancer. The failure function becomes flat at 0.6185 indicating that not all the subjects have died, yet.
Figure 1. Distribution of the Survival Duration (Left) and Failure Curve (Right)
Table 1. Stage-wise Colon Cancer Statistics of the Variables used in this Study
N (%) |
Death (%) |
Variable |
Mean |
SD |
|
Stage 0 |
373 (1.23) |
77 (20.64) |
Survival_Duration Age Tumor_Size Nodes_Examined Positive_Nodes |
69.55 66.45 25.96 13.10 0.00 |
40.00 11.51 27.63 9.37 0.05 |
Stage 1 |
5854 (19.35) |
1596 (27.26) |
Survival_Duration Age Tumor_Size Nodes_Examined Positive_Nodes |
64.06 68.72 32.17 15.45 0.00 |
38.49 12.87 27.35 8.80 0.00 |
Stage 2A |
8499 (28.1) |
2968 (34.92) |
Survival_Duration Age Tumor_Size Nodes_Examined Positive_Nodes |
58.67 69.89 51.63 17.48 0.00 |
38.29 13.84 34.09 9.31 0.00 |
Stage 2B |
1294 (4.28) |
588 (45.44) |
Survival_Duration Age Tumor_Size Nodes_Examined Positive_Nodes |
45.58 68.69 65.46 17.49 0.00 |
36.99 14.38 48.30 8.89 0.00 |
Stage 3A |
1095 (3.62) |
328 (29.95) |
Survival_Duration Age Tumor_Size Nodes_Examined Positive_Nodes |
62.19 65.25 32.58 15.87 1.45 |
39.00 13.61 23.02 9.04 0.86 |
Stage 3B |
5365 (17.73) |
2256 (42.05) |
Survival_Duration Age Tumor_Size Nodes_Examined Positive_Nodes |
52.96 67.11 51.04 17.50 1.66 |
38.53 14.31 32.65 9.50 1.15 |
Stage 3C |
3524 (11.65) |
1860 (52.78) |
Survival_Duration Age Tumor_Size Nodes_Examined Positive_Nodes |
45.85 65.86 52.38 19.61 7.66 |
36.81 14.68 34.16 9.65 5.07 |
Stage 4 |
4247 (14.04) |
3379 (79.56) |
Survival_Duration |
24.83 64.92 56.66 16.83 5.41 |
25.93 13.94 40.08 9.24 6.32 |
RESULTS AND DISCUSSION:
Parametric Models-Accelerated Failure Time Model:
Parametric survival modeling is performed to see the difference in the survival between the patients who registered their initial stage of colon cancer after adjusting for patient’s age, survival duration, tumor size, number of lymph nodes examined, number of positive lymph nodes, and histology.
To compare the fitted parametric survival models, we identified the best parametric model based on Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and log-likelihood values23. The best fit model is the one with smaller AIC, BIC values and largest Log-likelihood. Results of fitted parametric models are shown in the Table 3. Gamma parametric model has the lowest AIC, BIC values and highest likelihood values and hence performs better than other models. The parameter estimates of the gamma model are given in the Table 4 below. Based on the estimates of Gamma model, Wald tests produced by default indicate that in-situ (histology) is not significant (p value = 0.59), and the rest of the variables are strongly significant (p<.0001). The negative estimated β-coefficient for distant (histology) variable shows that patients have a shorter survival time compared to localized patients. Age, tumor sizes and positive nodes contribute a great effect on colon cancer survival. Negative estimates for these estimates indicate the same, a shorter survival of colon cancer patients. Stage 2B patients has a lesser survival estimate compared to stage 3A.
Table 2. Published Research works using Quantile Survival Regression
Author(s) |
Research Work |
Ying et al.8 |
A semiparametric procedure for median regression models with censored observations. |
Yang9 |
Median regression estimators based on weighted empirical survival and hazard functions. |
Portnoy10 |
Generalized the principle of the Kaplan-Meier estimate using a recursively reweighted estimator of a quantile regression process. |
Carey et al.11 |
Interpretation of growth (failure curves) in pediatric AIDS is studied by developing models using loess smoothing, penalized likelihood quantile regressions are fit to model age-specific growth velocity distributions for gender-stratified cohorts. |
Yin et al.12 |
Quantile regression model with estimating equation approach for parameter estimation of right-censored correlated survival data. |
Peng and Huang13 |
A new quantile regression approach for survival data subject to conditionally independent censoring. The estimators are computed using martingale based estimating equations. |
Cai Yuzhi14 |
Established a quantile survival model for the censored data along with, the survival density, survival, or hazard functions of the survival time |
Fan C et al.15 |
Power transformed linear regression on quantile residual life for censored competing risks data. |
Hsieh JJ et al.16 |
Quantile regression based on counting process approach under semi-competing risks data |
Xue et al.17 |
Use of censored quantile regression model, which permits a more sensitive analysis of time to event data together with the Cox proportional hazards model. |
Faradmal et al.18 |
Used censored quantile regression (CQR) to provide in-depth insight in the multivariable association between prognosis factors and survival rates using breast cancer data. |
Flemming et al.19 |
The association between time-to-surgery (TTS) and cancer-specific (CSS) and overall survival (OS) were examined using multivariate Cox regression and using quantile regression at 42 days and 90th percentile. |
Zarean et al.20 |
Censored quantile regression was fitted to find the overall survival of the patients using adjusted effects of variables and was compared with Cox regression model. |
Hong et al.21 |
Quantile regression approach for right censored Boston Lung Cancer survivor cohort dataset with covariates of low- or high dimensionality. |
Qiu et al.22 |
Estimators are calculated using augmented inverse probability weighting technique using a quantile regression model for survival data with missing censoring indicators. |
Table 3. Goodness of fit results of Parametric Models
Model |
Log-Likelihood |
AIC |
BIC |
Weibull |
-29290.46971 |
58610.94 |
58735.16 |
Gamma |
-29216.39756 |
58464.8 |
58597.29 |
Log-Logistic |
-29289.46342 |
58608.93 |
58733.14 |
Log-Normal |
-29604.90039 |
59239.8 |
59364.02 |
Exponential |
-29290.57883 |
58609.16 |
58725.09 |
Table 4. Analysis of Maximum Likelihood Estimates for Gamma Distribution
Parameter |
|
Estimate |
Standard |
95% Confidence Limits |
|
Error |
|||||
Intercept |
|
6.8704 |
0.0968 |
6.6805 |
7.0602 |
Age |
|
-0.0459 |
0.0008 |
-0.0475 |
-0.0443 |
Tumor_Size |
|
-0.0023 |
0.0002 |
-0.0027 |
-0.0019 |
Histology |
Distant |
-0.4438 |
0.0735 |
-0.5878 |
-0.2998 |
Histology |
In Situ |
0.0195 |
0.036 |
-0.051 |
0.09 |
Histology |
Localized |
0 |
. |
. |
. |
Nodes_Examined |
|
0.0237 |
0.0012 |
0.0213 |
0.0261 |
Stage |
Stage 0 |
2.1123 |
0.1266 |
1.864 |
2.3605 |
Stage |
Stage 1 |
1.5226 |
0.0849 |
1.3562 |
1.6891 |
Stage |
Stage 2A |
1.2474 |
0.0798 |
1.0909 |
1.4039 |
Stage |
Stage 2B |
0.7384 |
0.0799 |
0.5818 |
0.895 |
Stage |
Stage 3A |
1.3403 |
0.0953 |
1.1535 |
1.5271 |
Stage |
Stage 3B |
0.9164 |
0.0785 |
0.7626 |
1.0702 |
Stage |
Stage 3C |
0.7925 |
0.078 |
0.6396 |
0.9454 |
Stage |
Stage 4 |
0 |
. |
. |
. |
Positive_Nodes |
|
-0.0708 |
0.0025 |
-0.0757 |
-0.0658 |
Nonparametric Analysis:
From the Kaplan-Meier (KM) estimates of the survival function, 868 (observed events = 868) of them are reported dead due to colon cancer in the interval of [0, 1) months. In the same interval, there are 205 censored observations. However, the censored observations do not change the survival estimates when they leave the study.
From the KM curve given in the Figure 2 (Left), it appears the probability of surviving beyond 50 months is approximately 65%. This probability is in the same lines as the cumulative distributive curve. From the hazard function graph of Figure 2 (Right), the hazard value is high in for the initial 40 to 50 months and exponentially declines to smaller values. At the beginning of the study, we expect around 0.02 failures per month, while 25 months later, for those who survived we expect 0.008 failures per month indicating a decline about two and half times than what is noticed in the beginning of the study. Table 5 provides the information related to first, second and third quantiles of the survival duration. The interval during which the first 25% of the population is expected to fail, [0, 29) months is much shorter than the interval during which the second 25% of the population is expected to fail, [29, 91). There is not enough failure data to generate the point estimate for the third 25% of the population. This is indicated by “.” in the Table 5. This clearly supports our understanding that the hazard of failure is greater during the beginning of the study.
Figure 2. Kaplan-Meier Survival Estimation (Left) and Hazard Function (Right)
Table 5. Summary Statistics for Time Variable Survival Duration
Quantile Estimates of Survival Times |
|||
Percent |
Point Estimate |
95% CI |
|
[Lower |
Upper) |
||
75 |
. |
. |
. |
50 |
91.000 |
89.000 |
93.000 |
25 |
29.000 |
28.000 |
30.000 |
Figure 3. Kaplan Meier Stage Stratification Survival Analysis Estimates and Negative Log of Estimated Survivor Functions
Figure 3 (Left) gives a stage-wise KM estimate graph. The survival probabilities for the patients in Stage 0, Stage 1, Stage 3A and Stage 2A are higher than the survival probabilities for the patients in Stage 3B, Stage 2B, Stage 3C and Stage 4. The log-rank, Wilcoxon and likelihood ratio tests for homogeneity indicate strong significant evidence among the survival curves for all the stages (p<0.0001). This behavior is evident in the negative log survival estimate curves given in the Figure 3 (Right). Neither curve in the negative log survival estimates versus survival duration approximates a straight line through the origin indicating that exponential parametric model is not appropriate for this survival data. The log of negative log of estimated survivor function given in the Figure 4 has more than one curve crossing over the other, violating the proportional hazards assumption.
Figure 4. Stage-wise Log of Negative Log of Estimated Survival Function
Semiparametric Model-Proportional Hazards Model:
Cox proportional hazard model24 is used to determine the difference of survival duration between a patients age, survival duration, tumor size, number of lymph nodes examined, number of positive lymph nodes, and histology. Cox model reached its convergence, and the model tests including likelihood, score test and Wald test are all significant. The parameter estimates values of Cox regression model along with hazard ratios are given in Table 6 below.
An increment of one year of age, hazard value increases by 4%. Histology of the patients reported as distant have a 54% greater hazard and in-situ patients have a 4% lower hazard rate than localized histology patients. Compared to stage 4 patient’s stage 0 has 86%, stage 2B has a 46% and stage 3A has a 71.5% lower hazard rates. While positive lymph nodes reported a 7% greater hazard rate. These conclusions are in line with our previous KM and parametric analyses.
Table 6. Cox PH Model Estimates
Parameter |
|
Parameter Estimate |
Standard Error |
Hazard Ratio |
Age |
|
0.04512 |
0.000744 |
1.046 |
Tumor_Size |
|
0.00192 |
0.000184 |
1.002 |
Histology |
Distant |
0.4298 |
0.06652 |
1.537 |
Histology |
In Situ |
-0.0382 |
0.03379 |
0.963 |
Nodes_Examined |
-0.02424 |
0.00117 |
0.976 |
|
Stage |
Stage 0 |
-1.93504 |
0.12079 |
0.144 |
Stage |
Stage 1 |
-1.38193 |
0.07701 |
0.251 |
Stage |
Stage 2A |
-1.10618 |
0.07186 |
0.331 |
Stage |
Stage 2B |
-0.62215 |
0.07089 |
0.537 |
Stage |
Stage 3A |
-1.25569 |
0.08829 |
0.285 |
Stage |
Stage 3B |
-0.8197 |
0.07087 |
0.441 |
Stage |
Stage 3C |
-0.70982 |
0.07039 |
0.492 |
Positive_Nodes |
0.06475 |
0.00207 |
1.067 |
Quantile Regression Model– Examining Potential Heterogeneous Effects:
As discussed above, Quantile regression is the best approach when the data is skewed. In our case, the survival durations data is skewed, and we prefer to model the data using quantile regression to learn how are the extreme survival times related with the covariates of the model. Using QSR we fit a linear model for the log of survival duration of the colon cancer patients with the covariates patients age, survival duration, tumor size, number of lymph nodes examined, number of positive lymph nodes, and histology. 58% of the data is censored. Table 7 provides the parameter estimates for quantiles 0.1 – 0.4. Each of the requested quantiles has a set of parameter estimates and confidence limits. The confidence limits are computed by resampling methods. The QSR results and the plots given in Figure 5 does not report any estimates, and the 95% CI band after the 40th percent quantile time as there are no events at or after that time-point. For quantiles 0.5 through 0.7, since the survival function does not reach beyond 0.38, we will not be able to obtain a standard error or CI bounds for the quantiles.
The behavior of the coefficients of the covariates are given below in the Figure 5. These are the scatter plots of the estimated regression parameters against the quantiles. Notice that the effect of tumor size and age variables is negative and small over the lower quantiles. The estimates of the tumor size gradually increase from lower quantiles as we move to higher. However, the effect of the age parameter reached to a constant value around 0.4 quantile. The estimate of the tumor size has a value of -0.006 for the 0.1 quantile and increased to -0.003 for the 0.4 quantile and increased until 0.7 quantile. Similarly, the age parameter estimate has a value of -0.051 for the 0.1 quantile and levels of around -0.042 at 0.4 and higher quantiles. Compared to the other covariates, we notice a positive trend in stage 2B, stage 3C and a negative trend for the positive nodes parameters. Positive nodes estimate initially followed a negative trend until quantile 0.4 and started to move in a positive slope path. A non-constant curve is an indication of heterogeneity in the data. The QSR equation interpretation is similar to interpretation of a regression analysis equation.
Table 7. Quantile Survival Regression Estimates
τ |
Parameter |
Estimate |
Standard |
τ |
Parameter |
Estimate |
Standard |
Error |
Error |
||||||
0.1 |
Intercept |
5.208 |
0.201 |
0.3 |
Intercept |
5.967 |
0.152 |
|
Age |
-0.051 |
0.001 |
|
Age |
-0.043 |
0.001 |
|
Tumor_Size |
-0.006 |
0.001 |
|
Tumor_Size |
-0.004 |
0.001 |
|
Distant |
-0.395 |
0.154 |
|
Distant |
-0.484 |
0.118 |
|
In Situ |
-0.009 |
0.063 |
|
In Situ |
0.060 |
0.041 |
|
Localized |
0.000 |
0.000 |
|
Localized |
0.000 |
0.000 |
|
Nodes_Examined |
0.024 |
0.002 |
|
Nodes_Examined |
0.023 |
0.002 |
|
Stage 0 |
2.188 |
0.239 |
|
Stage 0 |
1.914 |
0.136 |
|
Stage 1 |
1.750 |
0.188 |
|
Stage 1 |
1.374 |
0.129 |
|
Stage 2A |
1.395 |
0.179 |
|
Stage 2A |
1.172 |
0.124 |
|
Stage 2B |
0.571 |
0.194 |
|
Stage 2B |
0.563 |
0.136 |
|
Stage 3A |
1.247 |
0.218 |
|
Stage 3A |
1.271 |
0.134 |
|
Stage 3B |
0.894 |
0.168 |
|
Stage 3B |
0.790 |
0.118 |
|
Stage 3C |
0.692 |
0.161 |
|
Stage 3C |
0.611 |
0.124 |
|
Stage 4 |
0.000 |
0.000 |
|
Stage 4 |
0.000 |
0.000 |
|
Positive_Nodes |
-0.062 |
0.006 |
|
Positive_Nodes |
-0.076 |
0.005 |
0.2 |
Intercept |
5.512 |
0.150 |
0.4 |
Intercept |
6.101 |
0.139 |
|
Age |
-0.045 |
0.001 |
|
Age |
-0.042 |
0.001 |
|
Tumor_Size |
-0.006 |
0.001 |
|
Tumor_Size |
-0.004 |
0.001 |
|
Distant |
-0.352 |
0.116 |
|
Distant |
-0.371 |
0.100 |
|
In Situ |
0.041 |
0.042 |
|
In Situ |
0.035 |
0.039 |
|
Localized |
0.000 |
0.000 |
|
Localized |
0.000 |
0.000 |
|
Nodes_Examined |
0.023 |
0.002 |
|
Nodes_Examined |
0.023 |
0.001 |
|
Stage 0 |
2.056 |
0.156 |
|
Stage 0 |
1.943 |
0.117 |
|
Stage 1 |
1.659 |
0.134 |
|
Stage 1 |
1.416 |
0.115 |
|
Stage 2A |
1.406 |
0.129 |
|
Stage 2A |
1.235 |
0.108 |
|
Stage 2B |
0.653 |
0.136 |
|
Stage 2B |
0.715 |
0.112 |
|
Stage 3A |
1.368 |
0.145 |
|
Stage 3A |
1.444 |
0.120 |
|
Stage 3B |
0.929 |
0.129 |
|
Stage 3B |
0.911 |
0.107 |
|
Stage 3C |
0.742 |
0.123 |
|
Stage 3C |
0.779 |
0.103 |
|
Stage 4 |
0.000 |
0.000 |
|
Stage 4 |
0.000 |
0.000 |
|
Positive_Nodes |
-0.070 |
0.006 |
|
Positive_Nodes |
-0.077 |
0.004 |
τ = Quantile
CONCLUSION:
In this paper, the main aim was to study the factors affecting the survival of colon cancer patients. For this, we employed parametric, semi-parametric, non-parametric, and quantile survival regression approaches. Parametric models, Weibull, Gamma, Lognormal, Log-logistic and exponential, in patients' survival analyses were analyzed. Gamma model performed the best among all parametric models. Compared to localized histology tumor patients, distant histology tumor patients have a lower survival rate. Similarly, compared to stage 4 patients, stage 2B patients have a lower survival and stage 3A and 3B have a higher survival rate. These data are later analyzed using non-parametric Kaplan-Meier approach and semi-parametric Cox regression approach. In the KM approach, stage 2B has a higher median survival of 117 months and Stage 4 reported a shorter median time of 20 months. There is no median value reported for the survival of stage 0, stage 1 and stage 3A patients because the KM estimator for this group never reached a survival probability lower than 63.78%, 52.45% and 54.08% respectively.
Figure 5. Quantile Survival Parameter Estimates
From the results of Cox regression, we have a 0.045 unit increase in the expected log of the relative hazard for each one-year increase in age. A 10.46% increase in the expected hazard relative to a one-year increase in age or the expected hazard is 1.05 times higher in a person who is one year older than another. It appears that there is a decrease in the hazard rate of patients in stages in situ, localized, and regional compared to distant spread. The decrease in the parameter estimates of these parameters are significant.
Finally, using a Quantile Survival Regression (QSR), which is a distribution-free approach, we modeled the survival data. QSR is very useful when we are interested in modeling the survival time and when the effects of covariates on the survival distribution differ with the covariate level. While modeling using QSR approach, inference about the regression parameters for a particular quantile depends only on the conditional distribution near that quantile. In addition to these, QSR model is the direct interpretation of estimated parameter coefficients in terms of change in quantile of survival time distribution.
Consider quantile 0.1. Compared with the stage 4 patients, the overall survival of stage 0 patients’ is 2.19 months longer, which is also significant. The estimates for all stages are significant. Also, one unit increment in the tumor size resulted in 0.006 months loss in survival. All the estimates except for in-situ are significant. Estimates above 0.7 quantile are not produced. Such details could not be detected using a Cox or a parametric model.
The parametric regression coefficients are interpreted as the effect on the mean or median of the survival time, whereas the QSR regression coefficients apply to specified quantiles of the survival time. Unlike these two methods, Cox proportional hazards modeling models the hazard function. However, the Cox model requires no parametric assumption about the baseline hazard can also incorporate time-dependent covariates. We conclude that QSR models may be adopted if one wishes to achieve good quantile prediction for lower quantiles of the colon cancer data and Cox model may be preferred in terms of overall prediction performance.
CONFLICT OF INTEREST:
The authors have no conflicts of interest regarding this research work.
REFERENCES:
1. Mudunuru V. Comparison of activation functions in multilayer neural networks for stage classification in breast cancer. Neural, Parallel, and Scientific Computations. 2016; 24:83-96.
2. Ahmed FE, Vos PW, Holbert D. Modeling survival in colon cancer: a methodological review. Molecular Cancer. 2007 Dec; 6(1):1-2.
3. Singh R, Mukhopadhyay K. Survival analysis in clinical trials: Basics and must know areas. Perspectives in clinical research. 2011 Oct; 2(4):145.
4. Allison PD. Survival analysis using SAS: a practical guide. Sas Institute; 2010 Mar 29.
5. Koenker R, Bassett Jr G. Regression quantiles. Econometrica: journal of the Econometric Society. 1978 Jan 1:33-50.
6. Koenker R, Geling O. Reappraising medfly longevity: a quantile regression survival analysis. Journal of the American Statistical Association. 2001 Jun 1; 96(454):458-68.
7. Howlader N, Noone AM, Krapcho M, Miller D, Bishop K, Altekruse SF, Kosary CL, Yu M, Ruhl J, Tatalovich Z, Mariotto A. SEER Cancer Statistics Review, 1975–2013. Bethesda, MD: National Cancer Institute; 2016.
8. Ying Z, Jung SH, Wei LJ. Survival analysis with median regression models. Journal of the American Statistical Association. 1995 Mar 1; 90(429):178-84.
9. Yang S. Censored median regression using weighted empirical survival and hazard functions. Journal of the American Statistical Association. 1999 Mar 1; 94(445):137-45.
10. Portnoy S. Censored regression quantiles. Journal of the American Statistical Association. 2003 Dec 1; 98(464):1001-12.
11. Carey VJ, Yong FH, Frenkel LM, McKinney RM. Growth velocity assessment in paediatric AIDS: smoothing, penalized quantile regression and the definition of growth failure. Statistics in Medicine. 2004 Feb 15; 23(3):509-26.
12. Yin G, Cai J. Quantile regression models with multivariate failure time data. Biometrics. 2005 Mar; 61(1):151-61.
13. Peng L, Huang Y. Survival analysis with quantile regression models. Journal of the American Statistical Association. 2008 Jun 1; 103(482):637-49.
14. Cai Y. A quantile survival model for censored data. Australian & New Zealand Journal of Statistics. 2013 Jun; 55(2):155-72.
15. Fan C, Zhang F, Zhou Y. Power-transformed linear regression on quantile residual life for censored competing risks data. Communications in Statistics-Theory and Methods. 2016 Oct 17; 45(20):5884-905.
16. Hsieh JJ, Wang HR. Quantile regression based on counting process approach under semi-competing risks data. Annals of the Institute of Statistical Mathematics. 2018 Apr; 70(2):395-419.
17. Xue X, Xie X, Strickler HD. A censored quantile regression approach for the analysis of time to event data. Statistical methods in medical research. 2018 Mar; 27(3):955-65.
18. Faradmal J, Roshanaei G, Mafi M, Sadighi-Pashaki A, Karami M. Application of censored quantile regression to determine overall survival related factors in breast cancer. Journal of research in health sciences. 2016; 16(1):36.
19. Flemming JA, Nanji S, Wei X, Webber C, Groome P, Booth CM. Association between the time to surgery and survival among patients with colon cancer: a population-based study. European Journal of Surgical Oncology (EJSO). 2017 Aug 1; 43(8):1447-55.
20. Zarean E, Mahmoudi M, Azimi T, Amini P. Determining Overall Survival and Risk Factors in Esophageal Cancer Using Censored Quantile Regression. Asian Pacific journal of cancer prevention: APJCP. 2018; 19(11):3081.
21. Hong HG, Christiani DC, Li Y. Quantile regression for survival data in modern cancer research: expanding statistical tools for precision medicine. Precision clinical medicine. 2019 Jun 1; 2(2):90-9.
22. Qiu Z, Ma H, Chen J, Dinse GE. Quantile regression models for survival data with missing censoring indicators. Statistical methods in medical research. 2021 May; 30(5):1320-31.
23. Mudunuru VR. Modeling and Survival Analysis of Breast Cancer: A Statistical, Artificial Neural Network, and Decision Tree Approach. University of South Florida; 2016.
24. Klein JP, Zhang MJ. Survival analysis, software. Encyclopaedia of biostatistics. 2005 Jul 15; 8.
Received on 11.11.2021 Modified on 17.03.2022
Accepted on 20.06.2022 © RJPT All right reserved
Research J. Pharm. and Tech 2023; 16(3):1401-1408.
DOI: 10.52711/0974-360X.2023.00231